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Abstract: Platelet glycoprotein 4 (CD36) (or fatty acyl translocase [FAT], or scavenger 
receptor class B, member 3 [SCARB3]) is an essential cell surface and skeletal muscle 
outer mitochondrial membrane glycoprotein involved in multiple functions in the body. 
CD36 serves as a ligand receptor of thrombospondin, long chain fatty acids, oxidized low 
density lipoproteins (LDLs) and malaria-infected erythrocytes. CD36 also influences 
various diseases, including angiogenesis, thrombosis, atherosclerosis, malaria, diabetes, 
steatosis, dementia and obesity. Genetic deficiency of this protein results in significant 
changes in fatty acid and oxidized lipid uptake. Comparative CD36 amino acid sequences 
and structures and CD36 gene locations were examined using data from several vertebrate 
genome projects. Vertebrate CD36 sequences shared 53-100% identity as compared with 
29-32% sequence identities with other CD36-like superfamily members, SCARB1 and 
SCARB2. At least eight vertebrate CD36 /V-glycosylation sites were conserved which are 
required for membrane integration. Sequence alignments, key amino acid residues and 
predicted secondary structures were also studied. Three CD36 domains were identified 
including cytoplasmic, transmembrane and exoplasmic sequences. Conserved sequences 
included N- and C-terminal transmembrane glycines; and exoplasmic cysteine disulphide 
residues; TSP-1 and PE binding sites, Thr92 and His242, respectively; 17 conserved 
proline and 14 glycine residues, which may participate in forming CD36 'short loops'; and 
basic amino acid residues, and may contribute to fatty acid and thrombospondin binding. 
Vertebrate CD36 genes usually contained 12 coding exons. The human CD36 gene 
contained transcription factor binding sites (including PPARG and PPARA) contributing to 
a high gene expression level (6.6 times average). Phylogenetic analyses examined the 
relationships and potential evolutionary origins of the vertebrate CD36 gene with 
vertebrate SCARB1 and SCARB2 genes. These suggested that CD36 originated in an 
ancestral genome and was subsequently duplicated to form three vertebrate CD36 gene 
family members, SCARB1, SCARB2 and CD36. 
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1. Introduction 

Platelet glycoprotein 4 (CD36) (cluster of differentiation 36) (or fatty acyl translocase [FAT]; and 
scavenger receptor class B, member 3 [SCARB3]) is one of at least three members of the CD36-like 
family that is an integral membrane protein of many tissues of the body which plays a role in fatty acyl 
translocation and as a multiple ligand cell surface receptor of oxidized LDL lipoproteins (ox-LDL), 
long chain fatty acids, aged neutrophils and Plasmodium falciparum-paxasitized erythrocytes (PE) 
which has been implicated in several diseases including insulin resistance, diabetes, atherosclerosis 
and malaria [1-10]. CD36 has also been reported on the outer mitochondrial membrane of skeletal 
muscle and serves a long chain fatty acid transport role, as well as contributing to the regulation of fatty 
acid oxidation by muscle mitochondria [11]. In addition, CD36 contributes to cerebrovascular oxidative 
stress and neurovascular dysfunction induced by amyloid-beta in Alzeheimer's dementia [12,13] and 
may serve a 'lipid-sensing' role in the body with a broad physiological role as a lipid -receptor protein 
which influences eating behavior and energy balance [14]. Moreover, a specific CD36-dependent 
signaling pathway has been proposed for platelet activation by ox-LDL [15]. 

SCARB1 (also called CLA1, SRB1 and CD36L1), a second member of the CD36-like family, is a 
homo-oligomeric plasma membrane cell surface glycoprotein receptor for high density lipoprotein 
cholesterol (HDL), other phospholipid ligands and chylomicron remnants [16-20]. SCARB2 (also 
called LIMP2 (lysosomal integral membrane protein), SRB2 and CD36L2) is a third member of the 
CD36 family predominantly integrated within lysosomal and endosomal membranes which contributes 
to lysosomal membrane organization and transport functions [21-25]. 

The gene encoding CD36 (CD36 in humans; Cd36 in mice) is localized on chromosome 7 qll.2 
and is encoded by 15 exons, including 12 coding exons [26-29]. Human CD36 is expressed at very 
high levels in various cells and tissues of the body, including platelets, monocytes/macrophages, and 
microvascular endothelial cells, plays important roles in atherosclerosis, inflammation, thrombosis and 
angiogenesis [4,6,7,30-32], and is upregulated in human monocytes following statin administration [33]. 
Studies of Cd36~/Cd36~ knockout mice have shown that CDJ <5-deficiency protects against 
Western-type diet related cardiac dysfunction [34-36] and contributes to a reduction in fatty acid 
oxidation by muscle mitochondria [11,37]. Human clinical studies have also examined CD36 
polymorphisms associated with enhanced atherosclerotic cardiovascular diseases [38,39], type II 
diabetes [9], oral fat perception, fat preference and obesity in African- Americans [40] and protection 
from malaria [41,42]. In addition, hepatic CD36 upregulation has been shown to be associated with 
insulin resistance, hyperinsulinaemia, and increased steatosis in patients with non-alcoholic 
steatohepatitis and chronic hepatitis C [43]. Reviews of the role of macrophage human CD36 in 
atherosclerosis have been published [7,44]. 

This paper reports the predicted gene structures and amino acid sequences for several vertebrate 
CD36 genes and proteins, the secondary structures for vertebrate CD36 proteins, several potential sites 
for regulating human CD36 gene expression and the structural, phylogenetic and evolutionary 
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relationships for these genes and enzymes with those for vertebrate CD36, SCARB1 and SCARB2 
gene families. 

2. Results and Discussion 

2.1. Alignments of Vertebrate CD36 Amino Acid Sequences 

The deduced amino acid sequences for cow (Bos taurus), opossum (Monodelphis domestica), 
chicken (Gallus gallus), frog (Xenopus tropicalis) and zebrafish (Danio rerio) CD36 are shown in 
Figure 1 together with previously reported sequences for human and mouse CD36 (Table 1) [45,46]. 
Alignments of human with other vertebrate CD36 sequences examined were 53-100% identical, 
suggesting that these are products of the same family of genes, whereas comparisons of sequence 
identities of vertebrate CD36 proteins with human SCARB1 and SCARB2 proteins exhibited lower 
levels of sequence identities (30-32%), indicating that these are members of distinct CD36-\ike gene 
families (Supplementary Table 1). 

The amino acid sequences for eutherian mammalian CD36 contained 472 residues, whereas 
opossum (Monodelphis domestica), platypus (Ornithorhynchus anatinus) and chicken (Gallus gallus) 
CD36 sequences contained 471 residues, while frog (Xenopus tropicalis) and zebrafish (Danio rerio) 
CD36 sequences contained 470 and 465 amino acids, respectively (Table 1; Figure 1). Previous studies 
have reported several key regions and residues for human and mouse CD36 proteins (human CD36 
amino acid residues were identified in each case). These included cytoplasmic /V-terminal and 
C-terminal residues: residues 2-6 and 462-472; /V-terminal and C-terminal trans-membrane helical 
regions: residues 7-28 and 440-461 [32,45]; palmitoylated cysteine residues (Cys3; Cys7; Cys464; and 
Cys466) in the N- and C-terminal CD36 cytoplasmic tails [47]; exoplasmic Thr92, which is 
phosphorylated by protein kinase C alpha and contributes to the suppression of thrombospondin-1 
binding in vitro [48]; His242 which contributes to the interaction of CD36-dependent endothelial cell 
adherence with Plasmodium falcurum [4]; and six exoplasmic disulfide bond forming residues: 
Cys243, Cys272, Cys311, Cys313, Cys322 and Cys333 [49]. 
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Table 1. CD36, SCARB1 and SCARB2 genes and proteins. RefSeq: the reference amino acid sequence; predicted Ensembl amino acid 
sequence; na-not available; GenBank IDs are derived from NCBI http://www.ncbi.nlm.nih.gov/genbank/; Ensembl ID was derived from 
Ensembl genome database http://www.ensembl.org; * designates scaffold; Un refers to unknown chromosome; UNIPROT refers to 
UniprotKB/Swiss-Prot IDs for individual CD36-like proteins (see http://kr.expasy.org); Un-refers to unknown chromosome; bps refers to base 
pairs of nucleotide sequences; the number of coding exons are listed; gene expression levels are in bold. 



CD36 Gene 


Species 


RefSeq ID 


GenBank ID 


UNIPROT 


Amino 


Chromosome 


Coding 


Gene Size 


Gene 






^nsembl/NCBI 




ID 


acids 


location 


Exons 
(strand) 


bps 


Expression 
Level 


Human 


Homo sapiens 


NM_001001547 


BC008406 


P16671 


472 


7:80,275,645-80,303,732 


12 (+ve) 


72,231 


6.6 


Chimpanzee 


Pan troglodytes 


'XP_5 19573 


na 


na 


472 


7:81,142,402-81,169,764 


12 (+ve) 


#27,363 


na 


Orangutan 


Pongo abelii 


'XP_0028 18343 


na 


na 


472 


7:95,750,733-95,779,630 


12 (-ve) 


#28,898 


na 


Gibbon 


Nomascus leucogenys 


1 XP_003252221 


na 


na 


472 


*GL397261: 1 1,570,433-1 1,598,1 14 


12 (+ve) 


#27,682 


na 


Rhesus 


Macaca mulatta 


NP_00 1028085 


na 


na 


472 


3:136,626,102-136,653,066 


12 (+ve) 


#27,682 


na 


Mouse 


Mus musculus 


NM_001 159555.1 


BCO 10262 


Q08857 


472 


5:17,291,543-17,334,712 


12 (-ve) 


43,170 


4.2 


Rat 


Rattus norvegicus 


NP_1 13749 


LI 9658 


Q07969 


472 


4:13,472,534-13,522,334 


12 (+ve) 


49,801 


0.3 


Guinea Pig 


Cavia porcellus 


1 XP_003469862 


na 


na 


472 


*31:20,074,611-20,098,210 


12 (+ve) 


#23,600 


na 


Cow 


Bos taurus 


NM_17410 


BC103112 


P26201 


472 


4:40,585,624-40,614,621 


12 (-ve) 


#28,998 


na 


Dog 


Canis familaris 


NM_001 177734 


ADE58431 


na 


472 


18:23,334,171-23,360,045 


12 (+ve) 


#25,875 


na 


Pig 


Sus scrofa 


NP_00 1038087 


AK400585 


Q3HUX1 


472 


9:93,204,848-93,241,842 


12 (-ve) 


#36,995 


na 


Rabbit 


Oryctolagus cuniculus 


!XP_0027 12062 


na 


na 


472 


7:35,303,111-35,333,630 


12 (-ve) 


#30,520 


na 


Horse 


Equus caballus 


!XP_00 1487957 


na 


na 


472 


4:6730,96-698,607 


12 (-ve) 


#25,512 


na 


Elephant 


Loxodonta africana 


1 XP_003407226 


na 


na 


472 


5: 69,036,730-69,073,879 


12 (-ve) 


#37,150 


na 


Opossum 


Monodelphis domestica 


'XP_00 1364375 


na 


na 


471 


8:149,041,138-149,075,533 


12 (-ve) 


#34,396 


na 


Platypus 


Ornithorhynchus anatinus 


'XP_001506583 


na 


na 


471 


*Ultra5:3,505,963-3,536,963 


12 (-ve) 


#31,001 


na 


Chicken 


Callus gallus 


'ENSGALG8439 


AJ719746 


F1NER9 


471 


1:12,077,308-12,107,415 


12 (-ve) 


30,108 


na 


Lizard 


Anolis carolinensis 


1 XP_003221568 


na 


na 


472 


5:93,087,943-93,120,933 


12 (-ve) 


#32,991 


na 


Frog 


Xenopus tropicalis 


NP_001107151 


na 


na 


470 


*GL17268 1 :663,550-679,762 


12 (-ve) 


#16,213 


na 


Zebrafish 


Danio rerio 


NP_001002363.1 


BC076048 


Q6DHC7 


465 


4:21,594,449-21,606,961 


12 (-ve) 


12,513 


na 



Biomolecules 2012, 2 



393 



Table 1. Cont. 



SLARB1 Gene 


Species 


■ i _ QC* ill 

ReiSeq ID 
^nsembl/NCBI 


GenBank ID 


UNIPROl 
ID 


Amino 
acids 


Chromosome 
location 


Coding 

Exons 

(strand) 


Gene Size 
bps 


Gene 

Expression 
Level 


Human 


Homo sapiens 


TV T Tk r f\f\ c r\C 

NM_00505 


BC022087 


Q8WVT0 


509 


12:125,267,232-125,348,266 


12 (-ve) 


81,035 


13.7 


Mouse 


Mus musculus 


NM_001 205082.1 


T"» riAA A £L C £. 

BC004656 


Q61009 


509 


5:125,761,478-125,821,252 


12 (-ve) 


63,985 


5.1 


Chicken 


Callus gallus 


1 v r» A 1 C 1 f\lL 

1 XP_415106 


na 


na 


503 


'\ C A C AH f\C A A CCO (~\C A 

15:4,543,054-4,558,954 


12 (+ve) 


15,901 


na 


Zebrahsh 


Danio rerio 


\th it 1 n o 1 ^ 1 

NM_198121 


BC044516 


E7FB50 


496 


1 1:21,526,513-21,572,478 


12 (-ve) 


45,684 


na 






















SLARB2 Gene 




















Human 


Homo sapiens 


NM_005506 


BT006939 


Q53Y63 


478 


/i "n no a no n 1 o /i tir\£. 

4:77,084,378-77,134,696 


12 (-ve) 


50,316 


3.2 


Mouse 


Mus musculus 


TV T Tk r AA^/ yl A 

NM_007644 


BL029073 


035114 


478 


5:92,875,330-92,934,334 


12 (-ve) 


59,005 


3.6 


Chicken 


Gallus gallus 


1 XP_42()93.1 


BX931548 


na 


481 


^ C1 /I 1 1 OZTO CI ylTA £OA 

4:51,41 1,268-51,429,620 


12 (+ve) 


18,353 


na 


Zebrafish 


Danio rerio 


lNlVl_l / JZjy. 1 






^ 1 


J. O0,y4z,UV0-D0,!7jj,44V 


Lj (+vej 




na 






















Gene 




















Lancelet 


Branchiostoma floridae 


1 XP_002609178.1 


na 


na 


480 


Un:534,334,234-534,343,082 


12 (+ve) 


8,849 


na 


Sea squirt 


Ciona intestinalis 


'XP_002 1270 15.1 


na 


na 


523 


09p:2,872,362-2,873,903 


1 (-ve) 


1,542 


na 


Nematode 


Caenorhabditis elegans 


NM_067224 


na 


Q9XTT3 


534 


111:12,453,609-12,456,726 


8 (+ve) 


3,118 


4.6 


Fruit fly 


Drosophila melanogaster 


NP_523859 


na 


na 


520 


2R:20,864,606-20,867,116 


6 (-ve) 


#2,511 


na 
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Figure 1. Amino Acid Sequence Alignments for Vertebrate CD36 Sequences. See Table 1 for sources of CD36 sequences; Hu-human; 
Mo-mouse; Co-cow; Op-opossum; Ch-chicken; Fr-frog; Zf-zebrafish; * shows identical residues for CD36 subunits; : similar alternate 
residues; . dissimilar alternate residues; predicted cytoplasmic residues are shown in red; predicted transmembrane residues are shown in blue; 
TY-glycosy lated and potential /V-glycosylated Asn sites are in green; exoplasmic Thr92, which is phosphorylated by pyruvate kinase alpha, is 
shown in predicted disulfide bond Cys residues are shown in blue; predicted a-helices for vertebrate CD36 are in shaded yellow and 
numbered in sequence from the start of the predicted exoplasmic domain; predicted ^-sheets are in shaded grey and also numbered in 
sequence; bold underlined font shows residues corresponding to known or predicted exon start sites; exon numbers refer to human CD36 gene 
exons; [g] residues refer to conserved glycines in the N- and C-terminal oligomerisation domains of the trans-membrane sequence [49]; CD36 
binding domains are identified: THP-refers to binding region for low-density lipoproteins [6-8]; neutrophil phagocytosis domain designated 
by [3,7]; PE binding refers to cytoadherence region of Plasmodium falciparum-pamsitized erythrocytes (PE) to endothelial cells [4]. 



Exonl 



I I 

Hu M-GCDRNCGLIA 

Ho M-GCDRNCGLIA 

Co M-GCNRNCGLIA 

Op M-GCDRNCGLIT 

Ch M-TCNRSCGLLT 

Fr M— CCSTKCWX.IV 

Zf mtccdqrcai.it 



GAVI 
GAVI 

GAVI 
GAVI 
GSVI 



«- al 

GjAVIAVF(GGjlI.MPVGDLLIQKTIKKQVVl.KEGT 
GRVLAVFGGILMPVGDMLIEKTIKREVVLEEGT 
G RVLAVF GG ILMPVGDMLI EKT I KKE VVLEEGT 
GGVLAVLGG ILMFVGDMI VQNTIKKECVIEDGT 
G AVLAIFGG VT.IPVGDNLINRAIKKDAV I SNGT 



GGLLAILGG 



G AVLG ALIALLGG 



ILFPVGDMI INKEI 3TEAV IEEGT 
ILIPVGDMI IKNTVHKETVLENGT 



exon 2 

^1 a 2 p2 

IAFKNWVKTGTEVYRQFWIFDVQNPQEVMMNS SN IQVKQRG1 
TAFKNWVKTGTTVYRQFW IFDVQN PDDVAKNS S KI KVKQRG 1- 
IAFKNWVKTGT3VYRQFWIFDVQNPDEVTVNSSKIKVKQRGP 
IAYKNWVKCGTEVYRQFWIFDVQNPEEVMINSTKLKVKQRG1- 
I AY DNWLVPGSSVYRQFWIFNVENPSDVLNFGARPKLEQRGI- 
IAYENWIEAGSPVYRHFWIYHVTNPDEIIN-GGKPILQQKGr 
LAFDTWTSVDIAMYRQFWIFNVENPDKVLSEGSKPVLVQKG1- 



I 0- [THP-binding domain 

l>3 P 4 

YRVRFLAKENVTQ3AEDNTVSFLQPNG 
YRVRYLAKENITQ3PE DHTVSFVQPNG 
YRVRY LAKENI TQ3PE THTVSFLQPNG 
YRVRY LAKENLTQN S - DNTISFVQPNG 
YRVRYLPKENITENP— NGTIS YMLPNA 
YRVRYLPKENITQLE— NNTVS YWQPNG 
YRVRY 1 PKTNI T FN D—NN TVS FVL PAG 



Hu 
Mo 
Co 
Op 
Ch 
Fr 
Zf 



Hu 
Mo 
Co 
Op 
Ch 
Fr 
Zf 



Hu 
Mo 
Co 
Op 
Ch 
Fr 
Zf 



p5 exon 3 p 6 

AIFEPSLSVGTEADNFTVLN LAVAAAS HI Y 
AIFEPSLSVGTEDDNFTVLNLAVAAAPH1Y 
AIFEPSLSVGTEDDTFTILNLAVAAAPQLY 
AI FERRLS VGTENDS FTVLNLA VAAAP Z LY 
ARFEPDMSVGTENDTITCLNLAWAAPALY 
AIFQREGSYGPEEDTYTVLNLAVAAAPAKE 
ATFEPSMSVGSEEDVFTSLNLAVAGVYRLi 



^-Neutrophil phagocytosis domain-^ 

a3 exon 4 p 7 at4 exon 

qnqfvqmilnslinkskssmfqvrtlrellwgyrdpflslvpyp-vtttvglfypyjnnta: 
qnsfvqvvlnslikkskssy-fqtrslkellwgykdpflslvpyp-isttvgvfypyndtvi 
pntfmqgilnsfikksksskfqnrtlkellwgytdpflnlvpyp-itttigvfypynnta: 
pnsfvqmvlnsfikkshsskfqvrtlkellwgykdpflslvpyp-idttvgvfypynntve 
knnfiqlllntwikssksnmlqnrtvkeilwgykdpflnkvpfp-ldpvlgvfypyngts: 
p--alqgllnaiikssnsslfqvrsvkellwgyrdpflekipidsidkttglfypnngta: 
g pkladwlirssgsslfqnrtvkellwgykdpmln slvgafypyngiv: 



GVYKVFNGK2NI 

GVYKVFNGKDNI 

GIYKVFNGKDD 

GVYKVYNGKDD 

GLYRV YTGKE D 

GIYHVYNGKGDI 

GPYTVFTGKDDI 



P 9 Exon 6 

SKVAIIDTYKGKRNLSY 238 

SKVAIIESYKGKRNLSY 238 

SKVAIIDTYKGRKNLSY 238 

SKVAIIDTYKDKKNLSF 2 37 

SKTAI IES YKNKRNLS Y 237 

SKVAIIDRYKEAKALPY 235 

NKVAIIERWQGETSVNY 227 



PE -binding 

'/.'E 3 — HCL1K _ N:JT jAA^ 
WPS-YCDMINGTDAAS 
WSS-YCDLINGTDAAS 
WPG— YGDMINGTDAAS 
WEG-YCDLVNGTDGAS 
WNDDFCDMINGTDAAS 
WND3YCDKINGSDGSS 



exon 7 piO 

PPFVEK3QVLQFFS 
FPPFVEK3RTLRFFS 
FPPFVEKTRVLQFFS 
PPFVEKTRILRFFE 
PPFVKKNQVLRFFS 
PPS VKKDKRLYFFS 
HPFLDKKEPLYFFS 



PH 

3D1CRSIYAVFESDVNLKGI 
SD1CRSIYAVFGSEIDLKGI 
SUi CRS I YAVFGAEINLKGI 
D 1 CRS I Y AE FEHE VN LKG I 
SDICRSIYGVYQTSKTVKGI 
SEICRSIYGIFEKEYMVKGI 
PDICRSISAEYEATVNLXGI 



12 exon 8 

PVYRFVLPSKAFAS 
PVYRFVLPANAFA! 
PVYRFILPSFAFAS 
PVYRFVLPSKAFAS 
PLYRFTVPREAFAS 
KLYRFWT E DAMAS 
DVYRYLLPV3ALA3 



PVEN 
PLQN 
PFQN 
P'l VN 
DV 
KN 
PVSN 



PDNYCFCTEKII 
P3NHCFCTEKV1 
PDNHCFCTEKII 
PDNDCFCTEKII 
GDNYCFCTDQVI 
PDNHCFCKDFQL 
NMCYCTDHE 



exon 9 

SKNCTS YGVL3ISKCKEGRPVYI SLPHFLYASPDVSEPI 357 

SNNCTS YGVL3IGKCKEGKPVYI SLPHFLHASPDVSEPI 357 

SKNCTLYGVLDIGKCKEGKPVYI SLPHFLHGSPELAEPI 357 

SKNCTSAGVLDISTCKDRKPVYI SLPHFLHASPDVPEPI 356 

SQNCTLAGVLDIS3CKAGRPVYISLPHFLHASESILHDV 35 6 

SRNCTAAGVLDLRSCQGGKPIFLSLPHFLYASDYLLDSV 355 

TRNCTLAGLLQITSCK— GTPVFISLPHFLYAS IELQQGV 34 6 



(113 

DGLNPNEEEHRTYLDIE 
EGLHPNEDEHRTYLDVE 
ESLSPNEEEHSTYLDVE 
EGLNPNEEEHRTYLDVE 
EGLSPNEEEHETFLDVE 
SGLKPNKEEHETYIDVE 
VGMNPNLDEHSIFLDVE 



exon 10 pl4 

pitgftlqfakrlqvnllvkpsek: 
pitgftlqfakrlqvnilvkpark: 
PITGi'T lrfakrlqvnmlvk pakk : 
pi tgf? lq fakrlqvn ilvkpvkk: 
ttgftlqfakrlqvnllvcpssk: 
pitgftmhfakrlqvnvmiqptdk: 
pitgftlrfskrlqvnkmygpsdd: 



exon 11 P15 

qvlknlkrnyivpilwlnetgti 
ealknlkrpyivp1lwlnetgti 
ealknlkhnyivpilwlnetgti 
dtlsklkrnylipilwlnetgti 
ealskvqkpyvfpilwlnesavi 
evksklq3elvfpvawlnetali 
allnkikehti ipilwlnetavl 



exoolasnic-* exon 12 I I 

GDEKANMFRSQVTGKINLLGLIEMILLSvHvVMFVAEMISYCACRSKTIK 

GDEKAEMFKTQVTGKIKLLGMVEMALLGI G WMFVAFMISYCACKSKNGK 

GDEKAEMFRNQVTGKINLIjGI.VEIVIjLSV G WMFIAFMIS YCACRSKRVN 

GDEKAEMFRKQVTGKI 2LLGLVEMVLLTV G IVTFVASMIAYCVCRSXKVK 

GDEKAEMFRNKVTGRVQLLGWQMVLI IA G SVLFLAFMGSYFICRSKKLK 

GDDSANMFKSKVTTPMKWEILRIVLI.CV G SWFLACSITLCVRGSKKQR 

DDETAKMFKNELISRMDMLEGLQIGLLVTGSAIFLGCMIGLIWCSKPSKTNLS 



472 
472 
472 
471 
471 
470 
4 65 
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2.2. Comparative Sequences for Vertebrate CD36 N-Glycosylation Sites 

Ten exoplasmic /V-glycosylation sites for human CD36 have been previously identified for this 
protein (Figure 1; Table 2) [50]. One of these sites (site 2) contained a proline residue at the second 
position and may not function as an /V-glycosylation site due to proline-induced inaccessibility [51]. 
Eight of these sites were predominantly retained among the 19 vertebrate CD36 sequences examined 
(sites 4, 5, 10, 15, 19, 23 and 25) (Figure 1; Table 2). The sequence conservation observed for these 
residues among the vertebrate CD36 sequences examined suggests that they contribute significantly to 
the structure and function of vertebrate CD36 as a glycoprotein. The multiple /V-glycosylation sites 
observed for vertebrate CD36 sequences suggest a role for /V-proteoglycan residues exposed on the 
external surface of plasma membranes in the performance of CD36 functions in binding various lipid 
molecules, including long chain fatty acids. This is also supported by recent animal model studies 
examining the impacts of reduced iV-glycosylation upon cardiac long chain fatty metabolism, which 
demonstrated a key role for /V-glycosylation in the recruitment of CD36 into cardiac membranes [52]. 

2.3. Conserved Glycines in the N-Terminal Domain of the CD36 Trans-Membrane Sequence 

The /V-terminal region for vertebrate CD36 sequences (residues 1-29 for human CD36) contained 
cytoplasmic (residues 2-7) and trans-membrane (residues 8-29) motifs which underwent changes in 
amino acid sequence but retained predicted cytoplasmic and trans -membrane properties in each case, 
respectively (Figure 1). Vertebrate V-terminal trans-membrane sequences, in particular, were 
predominantly conserved, especially for CD36 Glyl2, Glyl6 and Gly24/Gly25 residues, which were 
observed among the vertebrate CD36 sequences examined (Figure 1). Site directed mutagenesis 
studies of the related human SCARB1 sequence have demonstrated key roles for V-terminus 
trans-membrane sequence glycine residues, by facilitating oligomerisation and selective lipid uptake 
by SCARB1 conserved glycine residues [53] and similar roles may apply to the conserved C-terminal 
domain CD36 glycine residues. A recent report has shown, however, that CD36 is capable of binding 
acetylated and oxidized low-density lipoproteins as a monomer, even though multiple homo- and 
hetero-protein interactions are formed in the plasma membrane [8]. A conserved glycine residue was 
also observed for the vertebrate C-terminal trans-membrane sequences (human CD36 Gly452) 
(Figure 1), however the role for this residue has not been investigated. 
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Table 2. Predicted Af-glycosylation sites for CD36 sequences. Numbers refer to amino acids in the acid sequences, including iV-asparagine; 
K-lysine; I-isoleucine; H-histidine; S-serine; T-threonine; Q-glutamine; D-aspartate; Y-tyrosine; and V-valine. Note that there are 25 potential 
sites identified for vertebrate CD36 and other CD36-like sequences, including 10 sites for human CD36 (see [49]). A^-glycosylation sites were 
identified using the NetNGlyc 1.0 web server (http://www.cbs.dtu.dk/services/NetNGlyc/). Higher probability iV-glycosylation sites are in bold. 
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Danio rerio 
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77NGTT 


100NVTY 


105NDST 
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2.4. Conserved Vertebrate CD36 Cysteine Residues 

Ten cysteine residues of the vertebrate CD36 sequences were conserved, including two within each 
of the TV- (Cys3 and Cys7) and C-terminal (Cys464 and Cys466) cytoplasmic sequences, and six 
within the vertebrate exoplasmic sequences (Cys243; Cys272; Cys311; Cys313; Cys322; and Cys333) 
(Figure 1). The CD36 TV- and C-terminal conserved cytoplasmic cysteine residues have been shown to 
be palmitoylated [47], which may contribute to protein-protein interactions, protein trafficking and 
membrane localization [54]. Comparative studies of vertebrate SCARB1 sequences have shown that 
TV- and C-terminal cytoplasmic sequences lacked any conserved cysteine residues in this region [55]. 
The six conserved exoplasmic vertebrate CD36 cysteine residues participate in disulfide bridge 
formation for bovine CD36 (Cys243-Cys311; Cys272-Cys333; and Cys313-Cys322), resulting in a 
1-3, 2-6 and 4-5 arrangement of the disulfide bridges [49]. In contrast, vertebrate SCARB1 exoplasmic 
sequences contain only four conserved cysteine residues forming disulfide bridges (Cys281; Cys321; 
Cys323; and Cys334); a fifth cysteine (Cys251) was not conserved among vertebrate SCARB1 
sequences [55]; and a conserved sixth cysteine (not observed in the CD36 sequence) (human SCARB1 
Cys384) which functions in lipid transfer activity [56,57]. 

2.5. Predicted Secondary Structures for Vertebrate CD 36 

Predicted secondary structures for vertebrate CD36 sequences were examined (Figure 1), 
particularly for the exoplasmic sequences. a-Helix and [3-sheet structures were similar in each case, 
with a a-helix extending beyond the TV-terminal and C-terminal trans-membrane regions, forming al 
and al, respectively. A consistent sequence of predicted secondary structure was observed for each 
of the vertebrate CD36 sequences: TV- terminal cytoplasmic sequence— TV-terminal transmembrane 
Sequence--al--pl--a2--p2--p3--p4--p5--p6--a3--p7--a4--p8--pl0--pll--pl2--pl3--pl4--pl5--a5--c-terminal 
trans-membrane sequence- C-terminal cytoplasmic sequence. Further description of the secondary and 
tertiary structures for CD36 must await the determination of the three dimensional structure for this 
protein, particularly for the exoplasmic region which directly binds oxidized LDL lipids and a wide 
range of other lipid-like structures, including long chain fatty acids [1-10]. 

2.6. Conserved Proline, Glycine and Charged Amino Acid Residues within the CD 3 6 Exoplasmic Domain 

Supplementary Figure 1 shows the alignment of 7 vertebrate CD36 amino acid sequences for the 
exoplasmic domain with colors depicting the properties of individual amino acids and conservation 
observed for some of these protein sequences. In addition to the key vertebrate CD36 amino acids 
detailed previously, others were also conserved, including 17 proline residues. A human CD36 genetic 
deficiency of one of these conserved prolines (Pro90— >Ser) confirmed the significance of this residue, 
which lacked platelet CD36 [56]. Human CD36 deficiency has been shown to cause systemic 
metabolic changes in glucose and long chain fatty acid metabolism [59]. Prolines play a major role in 
protein folding and protein-protein interactions, involving the cyclic pyrrolidine amino acid side chain, 
which may introduce turns (or kinks) in the polypeptide chain as well as having destabilizing effects 
on a-helix and [3- strand conformations [60]. In addition, the presence of sequential prolines within a 
protein sequence may confer further restriction in folding conformation and create a distinctive 
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structure, such as that reported for the mammalian Na + /H + exchanger, which plays a major role in 
cation transport [61]. Sequential prolines (Pro258-Pro259) were conserved for 6 of 7 vertebrate CD36 
sequences examined and these may confer a distinctive conformation in this region supporting the lipid 
receptor functions for this protein. Moreover, regions of water exposed proteins with high levels of 
proline residues are often sites for protein-protein interactions [62] and these residues may significantly 
contribute to the binding of lipoproteins by the exoplasmic region of CD36. Similar results have been 
recently reported for the vertebrate SCARB1 exoplasmic region, however in this case, 30 conserved 
proline residues were observed [55]. 

Supplementary Figure 1 also shows conservation of 14 glycine residues for vertebrate CD36 
exoplasmic domains, which due to their small size, may be essential for static turns, bends or close 
packing in the domain, or required for conformational dynamics during long chain fatty acid receptor 
on-off switching, as in the case of the aspartate receptor protein [63]. Both proline and glycine residues 
are frequently found in turn and loop structures of proteins, and usually influence short loop formation 
within proteins containing between 2 and 10 amino acids [61]. Evidence for these short loop structures 
within vertebrate CD36 exoplasmic sequences was evident from the predicted secondary structures for 
vertebrate CD36 (Figure 1), with proline and/or glycine residues found at the start of the following 
structures: <xl (Pro28; Gly30), pi (Gly58), a2 (Pro73), p3 (Gly89-Pro90), p8 (Gly210), 012 (Gly287) 
and a5 (Gly420; Gly423). Moreover, CD36 sequential proline residues (Pro255-Pro256) were located 
in a region with no predicted secondary structure (between [39 and (310) but with disulfide bonds, 
which suggests that this is a region of conformational significance for CD36. 

In addition to the prolines and glycine residues for the vertebrate exoplasmic CD36 sequences, 
there are several conserved charged amino acid residue positions, including positively charged 
Lys40/Lys41 located within the first predicted exoplasmic helix (al); Arg/Lys89, Arg95; Arg97 
and LyslOl within or near the predicted strand-[33/strand P4 THP-binding domain region; 
Lys233/Lys235/Arg236 near the PE-binding domain; Lys263 located near the piO strand; Arg276 
within the pi 1 strand and adjacent to a disulphide bond; Lys288 which lies between predicted pi 1 and 
P12 strands; Lys337 and Arg/Lys340 near a disulphide bond; Lys388/Arg389 near the predicted pi5 
strand; and Lys401/Lys409 within the last exoplasmic helix (a5). Two domains of the exoplasmic 
CD36 sequence have been potentially implicated in the binding and endocytosis of apoptotic 
neutrophils: residues 155-183; and 93-120 (see [7]) The latter domain is called CLESH (for CD36 
LIMP-II Emp [erythrocyte membrane protein] sequence homology) which is predominantly 
conserved, particularly near Thr92, which is phosphorylated by protein kinase C alpha and contributes 
to the suppression of thrombospondin-1 binding in vitro [48]. One or more of these positively charged 
CD36 exoplasmic regions may contribute to long chain fatty acid binding prior to the translocation of 
fatty acids inside the cell membrane. There are also several conserved acidic amino acid regions, 
particularly a sequence of three acidic amino acids (367Asp/68Asp/369Asp) near the pi3 predicted 
strand. The conserved nature of these CD36 charged residues suggests that play key functional roles 
for this cell membrane protein, which may include serving as the long chain fatty acid CD36 receptor site. 
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2.7. Alignments of Human CD36, SCARB1 and SCARB2 

The amino acid sequences for human CD36, SCARB1 and SCARB2 (see Table 1) are aligned in 
Figure 2. The sequences were 30-33% identical and showed similarities in several key features and 
residues, including cytoplasmic /V-terminal and C-terminal residues; V-terminal and C-terminal 
trans-membrane helical regions; exoplasmic disulfide bond forming residues, previously identified for 
bovine CD36: Cys243-Cys311; Cys272-Cys333; and Cys313-Cys322 [47]; several predicted 
V-glycosylation sites for human CD36 (10 sites), SCARB1 (9 sites) and SCARB2 (9 sites), of which 
only two are shared between these sequences CV-glycosylation sites 15 and 21 (Table 2); and similar 
predicted secondary structures previously identified for SCARB1 [55] (Figure 1). The Cys384 residue, 
for which the free-SH group plays a major role in SCARB1 -mediated lipid transport [57], was unique 
to SCARB1, being replaced by other residues for the corresponding CD36 and SCARB2 proteins 
(Phe383 and Ala379, respectively). V-terminal trans-membrane glycine residues, which play a role in 
the formation of SCARB1 oligomers [53], were also observed for the human CD36 sequence, with 
twin-glycines (Gly23-Gly24) conserved for the vertebrate CD36 sequences (Figure 1). In contrast, 
only one of these glycines (GlylO) was observed for the human SCARB2 sequence. These results 
suggest that human CD36, SCARB1 and SCARB2 proteins share several important properties, features 
and conserved residues, including being membrane-bound with cytoplasmic and transmembrane regions, 
having similar secondary structures, but being significantly different to serve distinct functions. 

Alignments were also prepared for the predicted lancelet (Branchiostoma floridae) and sea squirt 
(Ciona intestinalis) CD36-like sequences and a major epithelial membrane protein (EMP) from fruit 
fly (Drosophila melanogaster) (FBpp0072309) with the human CD36, SCARB1 and SCARB2 
sequences (Figure 2). The lancelet, sea squirt and fruit fly sequences examined shared many features 
with the CD36-like human sequences, including the N- and C-terminal cytoplasmic and transmembrane 
sequences; similarities in predicted secondary structures; positional identities for five conserved 
cysteine residues, indicating conservation of at least 2 disulfide bridges for these proteins; predicted 
V-glycosylation sites, including one which is shared across all 6 CD-like sequences (site 15 in Table 2); 
and trans-membrane glycine residues, which were observed in both the TV- and C-terminal sequences. 

2.8. Gene Locations and Exonic Structures for Vertebrate CD36 Genes 

Table 1 summarizes the predicted locations for vertebrate CD36 genes based upon BLAT 
interrogations of several vertebrate genomes using the reported human CD36 sequence [45] and the 
predicted sequences for other vertebrate CD36 genes and the UC Santa Cruz genome browser [64]. 
Vertebrate CD36 genes were transcribed on either the positive strand (e.g., human, chimpanzee, 
gibbon, rhesus, rat and dog genomes) or the negative strand (e.g., mouse, cow, pig, opossum, chicken, 
frog and zebrafish genomes). Figure 1 summarizes the predicted exonic start sites for human, mouse, 
cow, opossum, chicken, frog and zebrafish CD36 genes with each having 12 coding exons, in identical 
or similar positions to those reported for the human CD36 gene [28]. 
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Figure 2. Amino Acid Sequence Alignments for Human CD36, SCARB1, and SCARB2; 
and Lancelet, Sea Squirt and Fruit Fly CD36-like Sequences. See Table 1 for sources of 
CD36-like sequences; HuCD36-human CD36; HuSCAl-human SCARB1; HuSCA2- 
human SCARB2; LaCD36-lancelet CD36; SsCD36- sea squirt CD36; DmEMPl -fruit fly 
endothelial membrane protein; * shows identical residues for subunits; : similar alternate 
residues; . dissimilar alternate residues; predicted cytoplasmic residues are shown in red; 
predicted trans-membrane residues are shown in blue; A^-glycosylated and potential 
Af-glycosylated Asn sites are s hown in green; free-SH Cys involved in lipid transfer for 
human SCARB1 is shown in predicted disulfide bond Cys residues are shown in 

blue; predicted a-helices for CD36-like sequences are in shaded yellow and numbered in 
sequence from the start of the predicted exoplasmic domain; predicted [3-sheets are in 
shaded grey and also numbered in sequence; bold underlined font shows residues 
corresponding to known or predicted exon start sites; exon numbers are shown; |g] residues 
refer to conserved glycines in the N- and C-terminal oligomerisation domains of the 



trans-membrane sequence [49]; C-terminal SCARB1 |AKL| residues refer to PDZ-binding 
domain sequences [18,19]. 
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HuSCA2 TTLIITN-IPYIIMAl|VFFGLVFTWLACKGQGSMDEGTADERAPI,IRT 478 

LaCD36 NNIKLMYGVEYGMIAVgCVLLIMFIILFVKYRKRSRRFDMQS 4 8 0 

ssCD36 kvddwili^vlicvl|aaimslsllcyij>crkaykttkragsrknliigqynkdaelptvtoncvkf 531 

DmEMPl VPPKIRVALIVGI^AL@VILLLLSTFCLIRNSHRQSTLHLEGSNYLATAQVDMNKKQNKDNOPARY 52 0 

= I =0 = 



Figure 3 shows the predicted structures of mRNAs for two major human CD36 transcripts and the 
major Cd36 transcripts for mouse and rat Cd36 genes [46,65,66]. The human transcripts were ~2 kbs 
in length with 14 (isoform c) or 15 (isoform e) introns present for these CD 36 mRNA transcripts and 
in each case, a 3 '-untranslated region (UTR) was observed. The human CD36 genome sequence 
contained a number of predicted transcription factor binding sites (TFBS), including the dual promoter 
structure of PPARA (peroxisome proliferator-activated receptor-a) and PPARG (peroxisome 
proliferator-activated receptor-y) sites [67,68]. Moreover, the mouse Cd36 gene is regulated in a tissue 
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specific manner by PPARA in liver and by PPARG in adipose tissues [69]. Other TFBS sites predicted 
for the human CD36 5' promoter region included RSRFC4, a myocyte enhancer factor 2A found in 
muscle-specific and 'immediate early' genes [70]; CART1, a paired-class homeodomain transcription 
factor [71]; FOXJ2, a fork head transcriptional activator which is active during early development [72]; 
XBP1, a transcription factor which is critical for cell fate determination in response to endoplasmic 
reticulum stress [73]; and CDC5, a transcription activator and cell cycle regulator [74]. Hepatic 
upregulation of CD36 transcription in human patients has been recently shown to be significantly 
associated with insulin resistance, hyperinsulinaemia and increased steatosis in non-alcoholic 
steatohepatitis and chronic hepatitis C [43]. 

Figure 3. Gene Structures and Major Splicing Transcripts for the Human, Mouse and Rat 
CD36 Genes. Derived from the AceView website http://www.ncbi.nlm.nih.gov/ 
IEB/Re search/ Acembly/ mature isoform variants are shown with capped 5'- and 3'- ends 
for the predicted mRNA sequences [62]; NM refers to the NCBI reference sequence; exons 
are in pink; the directions for transcription are shown as 5' — > 3'; sizes of mRNA 
sequences are shown in kilobases (kb); predicted transcription factor binding sites (TFBS) 
for human Cd36 are shown: CART1- a paired-class homeodomain transcription factor [71]; 
RSRFC4-myocyte enhancement factor 2 A transcription factor [70]; XBP1 -transcription 
factor [73]; FOXJ2-fork-head transcription factor[72]; CDC5-transcription activator and 
cell cycle regulator; [74]; PPARA-peroxisome proliferator-activated receptor alpha; and 
PPARG-peroxisome proliferator-activated receptor gamma [67,68]. 

Human CD36 5'->3' chromosome 7:79,836,828-80,146,532 size=309.7kb on plus strand 
6.6 times average gene expression level 

L^-J\At^^ [NM] 

EMAjb/^^ [NM] 
Predicted transcription factor binding sites 

CARTl R5RFC4 XBP1 
FOXJ2 CDC5 PPARA PPARG 

Mouse Cd36 5'->3' chromosome 5:17,394,781 to 17,287,508 size=107.3kb on minus strand 
4.2 times average gene expression level 

Rat Cd36 5'->3' chromosome 4:13,065,071 to 13,525,617 size=60.5kb on plus strand 
0.3 times average gene expression level 

^eJvtdcuh a [nm] 
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2.9. Comparative Human and Mouse CD36 Tissue Expression 

Figure 4 presents 'heat maps' showing comparative gene expression for various human and mouse 
tissues obtained from GNF Expression Atlas Data using the U133A and GNF1H (human) and GNF1M 
(mouse) chips (http://genome.ucsc.edu; http://biogps.gnf.org) [75]. These data supported a broad and 
high level of tissue expression for human and mouse CD36, particularly for adipose tissue, heart, 
skeletal muscle and liver, which is consistent with previous reports for these genes [1 1,32,66]. Overall, 
human and mouse CD36 tissue expressions levels were 4-6 times the average level of gene expression 
which supports the key role played by this enzyme in fatty acid metabolism, especially in hver, muscle 
and adipose tissue. 

Figure 4. Comparative Tissue Expression for Human and Mouse CD36 Genes. Expression 
'heat maps' (GNF Expression Atlas 2 data) (http://biogps.gnf.org) were examined for 
comparative gene expression levels among human and mouse tissues for CD36 genes 
showing high (red); intermediate (black); and low (green) expression levels [75]. Derived 
from human and mouse genome browsers (http://genome.ucsc.edu) [64]. 

GNF Expression Atlas 2 Data From Human U133A and GNF1H Chips for CD36 




GNF Expression Atlas 2 Data from GNF1M Mouse Chip for Cd36 
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The broad tissue and high level of gene expression reported for human and mouse CD36 reflects 
key roles for this major cell membrane and muscle outer mitochondrial membrane glycoprotein in fatty 
acyl translocation and as a multiple ligand cell surface receptor of oxidized LDL lipoproteins (ox-LDL) 
and long chain fatty acids [7,11,33,66]. CD36 has also been described as a lipid 'sensor' playing a 
lipid receptor role for cells and tissues of the body [8,40]. Moreover, CD36 upregulation is associated 
with insulin resistance and hyperinsulinaemia, leading to liver pathology and increased steatosis [43]. 
In addition, cardiomyocyte CD36 cell surface recruitment is induced by insulin, AMP-dependent 
protein kinase (AMPK) activity or contraction, and is regulated in its vesicular trafficking by the 
RabGAP-AS160 substrate and AS160-Rab8a GTPase activating protein (GAP) [76-78]. These 
features provide a link between cell membrane CD36 and the reported insulin- stimulated 
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phosphorylation of AS 160 involved with the translocation of the glucose transporter GLUT4 to the 
plasma membrane [79,80]. It is also relevant to report that plasma levels of soluble CD36 are increased 
in type 2 diabetic patients [81]. 

Significant levels of CD36 expression have also been described in brain tissues, where CD36 
contributes to cerebrovascular oxidative stress and neurovascular dysfunction induced by amyloid-beta 
in Alzeheimer's dementia [12,13], and in transporting long chain fatty acids across the blood-brain 
barrier [82]. 

2.10. Phylogeny of Vertebrate CD36-Like Sequences 

A phylogenetic tree (Figure 5) was calculated by the progressive alignment of 21 vertebrate CD36 
amino acid sequences with human, mouse, chicken and zebrafish SCARB1 and SCARB2 sequences. 
The phylogenetic tree was 'rooted' with the lancelet {Br anchio stoma floridae) CD36 sequence (see 
Table 1). The phylogenetic tree showed clustering of the CD36 sequences into groups which were 
consistent with their evolutionary relatedness as well as groups for human, mouse, chicken and 
zebrafish SCARB1 and SCARB2 sequences, which were distinct from the lancelet CD36 sequence. 
These groups were significantly different from each other (with bootstrap values of -100/100) with the 
clustering observed supporting a closer phylogenetic relationship between CD36 and SCARB2, with 
the SCARB1 gene being more distantly related. This is suggestive of a sequence of CD36ASke, gene 
duplication events: ancestral CD36 gene duplication — > SCARB1 and CD36 genes; followed by a 
further CD36 duplication, generating the SCARB2 and CD36 genes found in all vertebrate species 
examined (Figure 5). It is apparent from this study of vertebrate CD36-\ike genes and proteins that this 
is an ancient protein for which a proposed common ancestor for the CD36, SCARB1 and SCARB2 
genes may have predated the appearance of fish > 500 million years ago [83]. In parallel with the 
evolution of CD36 and other CD36-like proteins (SCARB1 and SCARB2), thrombospondins (TSPs) 
are also undergoing evolutionary changes in their structures and functions [84], with gene duplication 
events proposed at the origin of deuterostomes. 

3. Methods 

3.1. Vertebrate CD 36 Gene and Protein Identification 

BLAST (Basic Local Alignment Search Tool) studies were undertaken using web tools from the 
National Center for Biotechnology Information (NCBI) (http://blast.ncbi.nlm.nih.gov/Blast.cgi) [85]. 
Protein BLAST analyses used vertebrate CD36 amino acid sequences previously described [8,45] 
(Table 1). Non-redundant protein sequence databases for several vertebrate genomes were examined 
using the blastp algorithm from sources previously described [55]. This procedure produced multiple 
BLAST 'hits' for each of the protein databases which were individually examined and retained in 
FASTA format, and a record kept of the sequences for predicted mRNAs and encoded CD36-like 
proteins. Predicted CD36-like protein sequences were obtained in each case and subjected to analyses 
of predicted protein and gene structures. 
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Figure 5. Phylogenetic Tree of Vertebrate CD36 Amino Acid Sequences with Human, 
Mouse, Chicken and Zebrafish SCARB1 and SCARB2 Sequences. The tree is labeled with 
the CD36-\iks, gene name and the name of the animal and is 'rooted' with the lancelet 
CD36 sequence. Note the 3 major clusters corresponding to the CD36, SCARB1 and 
SCARB2 gene families. A genetic distance scale is shown. The number of times a clade 
(sequences common to a node or branch) occurred in the bootstrap replicates are shown. 
Only highly significant replicate values of 95 or more are shown with 100 bootstrap 
replicates performed in each case. A proposed sequence of CD36 gene duplication events 
is shown. 
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BLAT (Blast-like Alignment Tool) analyses were subsequently undertaken for each of the predicted 
CD36 amino acid sequences using the UC Santa Cruz Genome Browser [64] with the default settings 
to obtain the predicted locations for each of the vertebrate CD36 genes, including predicted exon 
boundary locations and gene sizes. BLAT analyses were similarly undertaken for vertebrate SCARB1 
and SCARB2 genes using previously reported sequences in each case (see Table 1). Structures for 
human and mouse isoforms (splicing variants) for human CD36, mouse Cd36 and rat Cd36 were 
obtained using the AceView website to examine predicted gene and protein structures [66]. 

3.2. Predicted Structures and Properties of Vertebrate CD36 



Predicted secondary structures for vertebrate CD36 proteins, human SCARB1 and SCARB2, 
lancelet (Br anchio stoma floridae) CD36, sea squirt (Ciona intestinalis) CD36 and a fruit fly 
(Drosophila melanogaster) epithelial membrane protein (FBpp0072309) were obtained using the 
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PSIPRED v2.5 web site tools provided by Brunei University [86]. Molecular weights, /V-glycosylation 
sites [49] and predicted trans-membrane, cytosolic and exocellular sequences for vertebrate SCARB 1 
proteins were obtained using Expasy web tools (http://au.expasy.org/tools/pi_tool.html). 

3.3. Comparative Human and Mouse CD36 Gene Expression 

The genome browser (http://genome.ucsc.edu) [62] was used to examine GNF Expression Atlas 2 
data using various expression chips for human and mouse CD36 genes (http://biogps.gnf.org) [74]. 
Gene array expression 'heat maps' were examined for comparative gene expression levels among 
human and mouse tissues showing high (red); intermediate (black); and low (green) expression levels. 

3.4. Phylogeny Studies and Sequence Divergence 

Alignments of vertebrate CD36, SCARB 1 and SCARB2 sequences were assembled using BioEdit 
v.5.0.1 and the default settings [87]. Alignment ambiguous regions, including the amino and carboxyl 
termini, were excluded prior to phylogenetic analysis yielding alignments of 431 residues for 
comparisons of vertebrate CD36 sequences with human, mouse, chicken and zebra-fish SCARB 1 and 
SCARB2 sequences with the lancelet {Br anchio stoma floridae) CD36 sequence (Table 1). 
Evolutionary distances and phylogenetic trees were calculated as previously described [85]. Tree 
topology was reexamined by the boot-strap method (100 bootstraps were applied) of resampling and 
only values that were highly significant (^95) are shown [88]. 

4. Conclusions 

The results of this study indicate that vertebrate CD36 genes and encoded proteins represent a 
distinct gene and protein family of CD36-\ike proteins which share key conserved sequences that have 
been reported for other CD36-like proteins (SCARB 1 and SCARB2) previously studied [16-24]. 
CD36 has a unique property among these proteins in serving a major role in fatty acyl translocation 
and as a multiple ligand cell surface receptor of oxidized LDL lipoproteins (ox-LDL), long chain fatty 
acids, aged neutrophils and Plasmodium falciparum-paxasitized erythrocytes [3-10]. CD36 is encoded 
by a single gene among the vertebrate genomes studied and is highly expressed in human and mouse 
tissues, particularly in adipose tissue, heart, skeletal muscle and liver, and usually contain 12 coding 
exons. Predicted secondary structures for vertebrate CD36 proteins showed strong similarities with 
other CD36-like proteins, SCARB 1 and SCARB2. Three major structural domains were observed for 
vertebrate CD36 sequences, including N- and C-terminal cytoplasmic domains; N- and C-terminal 
trans-membrane domains; and an exoplasmic domain, which serves as the 'receptor' for long chain 
fatty acids and thrombospondins [5-8,14,32]. The latter domain contained three disulfide bridges [49]; 
several /V-glycosylation sites for glycan binding (7-10 sites), which are essential for membrane 
recruitment [52]; 17 conserved proline and 14 glycine residues, which may contribute to short loop 
structures for the CD36 exoplasmic structure; and several conserved basic amino acid sites, which may 
promote long chain fatty acid binding. Phylogenetic studies using 21 vertebrate CD36 sequences with 
human, mouse, chicken and zebrafish SCARB 1 and SCARB2 sequences indicated that the CD36 gene 
appeared early in evolution, prior to the appearance of bony fish more that 500 million years ago, and 



Biomolecules 2012, 2 



407 



has undergone at least two gene duplication events: ancestral CD36 — > vertebrate SCARB1 and CD36; 
with the latter gene undergoing a further gene duplication generating vertebrate CD36 and SCARB2 genes. 
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Supplementary Figure 1. Amino Acid Sequence Alignments for Vertebrate CD36 
Exoplasmic Sequences. Amino acids are color coded: yellow for proline (P); S (serine); 
green for hydrophilic amino acids, S (serine), Q (glutamine), N (asparagine), and 
T (threonine); brown for glycine (G); light blue for hydrophobic amino acids, L (leucine), 
I (isoleucine), V (valine), M (methionine), W (tryptophan); dark blue for amino acids, 
T (tyrosine) and H (histidine); purple for acidic amino acids, E (glutamate) and 
D (aspartate); and red for basic amino acids, K (lysine) and R (arginine); conserved 
prolines and glycines are designated as PI, P2 etc and Gl, G2 etc, respectively. Numbers 
refer to human CD36 amino acid sequence. 
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Supplementary Table 1. CD36, SCARB1 and SCARB2 proteins: subunit MWs and 
percentage identities. High % identities are in bold. 



CD36 Gene 


Species 


Subunit 

MW 


% Identity 
with human 


% Identity 
with human 


% Identity 
with human 




SCARB1 


SCARB2 


CD36 


Human 


Homo sapiens 


53,053 


31 


30 


100 


Chimpanzee 


Pan troglodytes 


53,064 


31 


30 


100 


Orangutan 


Pongo abelii 


53,039 


32 


30 


97 


Gibbon 


Nomascus leucogenys 


53,161 


32 


30 


96 


Rhesus 


Macaca mulatta 


53,041 


32 


31 


94 


Mouse 


Mus musculus 


52,698 


30 


31 


83 


Rat 


Rattus norvegicus 


52,731 


31 


30 


86 


Guinea Pig 


Cavia porcellus 


53,085 


32 


32 


81 


Cow 


Bos taurus 


52,940 


32 


30 


82 


Dog 


Canis familaris 


52,549 


31 


30 


82 


Pig 


Sus scrofa 


53,085 


31 


30 


82 


Rabbit 


Oryctolagus cuniculus 


52,729 


31 


31 


88 


Horse 


Equus caballus 


52,789 


31 


31 


83 


Elephant 


Loxodonta africana 


52,873 


31 


31 


80 


Opossum 


Monodelphis domestica 


53,017 


30 


30 


73 


Platypus 


Ornithorhynchus anatinus 


52,807 


31 


30 


73 


Chicken 


Gallus gallus 


52,624 


30 


32 


61 


Lizard 


Anolis carolinensis 


52,890 


31 


31 


61 


Frog 


Xenopus tropicalis 


52,696 


30 


29 


55 


Zebrafish 


Danio rerio 


51,590 


31 


31 


53 




SCARB1 Gene 


Species 


Subunit 

MW 


% Identity 
with human 


% Identity 
with human 


% Identity 
with human 




SCARB1 


SCARB2 


CD36 


Human 


Homo sapiens 


56,973 


100 


29 


31 


Mouse 


Mus musculus 


56,754 


79 


29 


29 


Chicken 


Gallus gallus 


55,918 


57 


28 


31 


Zebrafish 


Danio rerio 


55,742 


51 


28 


30 


SCARB2 Gene 


Human 


Homo sapiens 


54,290 


29 


100 


30 


Mouse 


Mus musculus 


54,044 


29 


85 


31 


Chicken 


Gallus gallus 


53,907 


30 


59 


33 


Zebrafish 


Danio rerio 


60,234 


31 


43 


33 


CD36 Gene 


Lancelet 


Branchiostoma floridae 


54,141 


34 


35 


35 


Sea squirt 


Ciona intestinalis 


58,009 


26 


33 


31 


Nematode 


Caenorhabditis elegans 


60,182 


21 


26 


24 


Fruit fly 


Drosophila melanogaster 


58,663 


20 


23 


26 
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